Automatic Indexing by Discipline and High-Level Categories: Methodology and Potential Applications

نویسندگان

  • Susanne M. Humphrey
  • Thomas C. Rindflesch
  • Alan R. Aronson
چکیده

This paper first describes the methodology of journal descriptor (JD) indexing, based on human indexing at the journal level using only 127 descriptors, and applying statistical methods that associate this journal indexing with text words in a training set of MEDLINE® citations. These associations form the basis for automatic indexing of documents outside the training set. The paper then presents the new technique of semantic type (ST) indexing, based on JD indexing associated with each of 134 ST’s, and applying the standard cosine coefficient measure to compare the similarity between the JD indexing of a document and the JD indexing of each ST. The ST indexing of the document is the list of ST’s ranked in decreasing order of similarity between the JD indexing of the document and the JD indexing of the ST’s. Discussion of the potential usefulness and application of the very general indexing provided by JD’s and ST’s comprises the remainder of the paper. JD’s have been used for more than thirty years to search MEDLINE by discipline, and discipline-based indexing is in evidence on the Web. It is suggested, with several examples, that ST’s may convey a unique slant of a document’s content not normally represented in standard indexing vocabularies. Use of ST indexing to rank retrieved output is mentioned as a possible application. Notwithstanding the importance of methodology and performance issues, the intent of this paper is to explore questions of the potential utility and applicability of JD and ST indexing.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

مدل دو مرحله ای شکاف- گلچین برای نمایه سازی خودکار متون فارسی

Purpose: Each language has its own problems. This leads to consider appropriate models for automatic indexing of every language. These models should concern the exhaustificity and specificity of indexing.   This paper aims at introduction and evaluation of a model which is suited for Persian automatic indexing. This model suggests to break the text into the particles of candidate terms and to c...

متن کامل

Integrating knowledge from different sources for automatic back-of-the-book indexing

The paper reports research on automatic back-of-the-book indexing. It presents a methodology which brings together knowledge from different disciplines. It is inspired by human indexing methodology and the results are more similar to manually-crafted indexes than those produced by previous automatic approaches. Issues of evaluation and applications are addressed. Résumé : Cette communication pr...

متن کامل

Classification of Text, Automatic

Automatic text classification (ATC) is a discipline at the crossroads of information retrieval (IR), machine learning (ML), and computational linguistics (CL), and consists in the realization of text classifiers, i.e. software systems capable of assigning texts to one or more categories, or classes, from a predefined set. Applications range from the automated indexing of scientific articles, to...

متن کامل

Comparing a rule-based versus statistical system for automatic categorization of MEDLINE documents according to biomedical specialty

Automatic document categorization is an important research problem in Information Science and Natural Language Processing. Many applications, including Word Sense Disambiguation and Information Retrieval in large collections, can benefit from such categorization. This paper focuses on automatic categorization of documents from the biomedical literature into broad discipline-based categories. Tw...

متن کامل

Mining Digital Imagery Data for Automatic Linguistic Indexing of Pictures

In this paper, we present a new research direction, automatic linguistic indexing of pictures, for data mining and machine learning researchers. Automatic linguistic indexing of pictures is an imperative but highly challenging problem. In our on-going research, we introduce a statistical modeling approach to this problem. Computer algorithms have been developed to mine numerical features automa...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1992